AITopics | diffusion-based model

Collaborating Authors

diffusion-based model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Safe Concept Transfer of Multi-Modal Diffusion via Causal Representation Editing

Neural Information Processing SystemsMar-18-2026, 11:50:24 GMT

artificial intelligence, machine learning, natural language, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.61)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Diffusion Model in Latent Space for Medical Image Segmentation Task

Ngoc, Huynh Trinh, Hai, Toan Nguyen, Son, Ba Luong, Quoc, Long Tran

arXiv.org Artificial IntelligenceDec-3-2025

Medical image segmentation is crucial for clinical diagnosis and treatment planning. Traditional methods typically produce a single segmentation mask, failing to capture inherent uncertainty. Recent generative models enable the creation of multiple plausible masks per image, mimicking the collaborative interpretation of several clinicians. However, these approaches remain computationally heavy. We propose MedSegLatDiff, a diffusion based framework that combines a variational autoencoder (VAE) with a latent diffusion model for efficient medical image segmentation. The VAE compresses the input into a low dimensional latent space, reducing noise and accelerating training, while the diffusion process operates directly in this compact representation. We further replace the conventional MSE loss with weighted cross entropy in the VAE mask reconstruction path to better preserve tiny structures such as small nodules. MedSegLatDiff is evaluated on ISIC-2018 (skin lesions), CVC-Clinic (polyps), and LIDC-IDRI (lung nodules). It achieves state of the art or highly competitive Dice and IoU scores while simultaneously generating diverse segmentation hypotheses and confidence maps. This provides enhanced interpretability and reliability compared to deterministic baselines, making the model particularly suitable for clinical deployment.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2512.01292

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative AI for Testing of Autonomous Driving Systems: A Survey

Song, Qunying, Ye, He, Harman, Mark, Sarro, Federica

arXiv.org Artificial IntelligenceAug-28-2025

Autonomous driving systems (ADS) have been an active area of research, with the potential to deliver significant benefits to society. However, before large-scale deployment on public roads, extensive testing is necessary to validate their functionality and safety under diverse driving conditions. Therefore, different testing approaches are required, and achieving effective and efficient testing of ADS remains an open challenge. Recently, generative AI has emerged as a powerful tool across many domains, and it is increasingly being applied to ADS testing due to its ability to interpret context, reason about complex tasks, and generate diverse outputs. To gain a deeper understanding of its role in ADS testing, we systematically analyzed 91 relevant studies and synthesized their findings into six major application categories, primarily centered on scenario-based testing of ADS. We also reviewed their effectiveness and compiled a wide range of datasets, simulators, ADS, metrics, and benchmarks used for evaluation, while identifying 27 limitations. This survey provides an overview and practical insights into the use of generative AI for testing ADS, highlights existing challenges, and outlines directions for future research in this rapidly evolving field.

machine learning, natural language, neural information processing system, (20 more...)

arXiv.org Artificial Intelligence

2508.19882

Country:

Europe (1.00)
Asia (0.67)
North America > United States > California (0.45)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Elucidating Rectified Flow with Deterministic Sampler: Polynomial Discretization Complexity for Multi and One-step Models

Yang, Ruofeng, Zhu, Zhaoyu, Jiang, Bo, Chen, Cheng, Li, Shuai

arXiv.org Artificial IntelligenceAug-13-2025

Recently, rectified flow (RF)-based models have achieved state-of-the-art performance in many areas for both the multi-step and one-step generation. However, only a few theoretical works analyze the discretization complexity of RF-based models. Existing works either focus on flow-based models with stochastic samplers or establish complexity results that exhibit exponential dependence on problem parameters. In this work, under the realistic bounded support assumption, we prove the first polynomial discretization complexity for multi-step and one-step RF-based models with a deterministic sampler simultaneously. For the multi-step setting, inspired by the predictor-corrector framework of diffusion models, we introduce a Langevin process as a corrector and show that RF-based models can achieve better polynomial discretization complexity than diffusion models. To achieve this result, we conduct a detailed analysis of the RF-based model and explain why it is better than previous popular models, such as variance preserving (VP) and variance exploding (VE)-based models. Based on the observation of multi-step RF-based models, we further provide the first polynomial discretization complexity result for one-step RF-based models, improving upon prior results for one-step diffusion-based models. These findings mark the first step toward theoretically understanding the impressive empirical performance of RF-based models in both multi-step and one-step generation.

artificial intelligence, complexity, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.08735

Country: Europe > Austria (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Inflationary Flows: Calibrated Bayesian Inference with Diffusion-Based Models

Neural Information Processing SystemsMay-27-2025, 12:51:50 GMT

Beyond estimating parameters of interest from data, one of the key goals of statistical inference is to properly quantify uncertainty in these estimates. In Bayesian inference, this uncertainty is provided by the posterior distribution, the computation of which typically involves an intractable high-dimensional integral. Among available approximation methods, sampling-based approaches come with strong theoretical guarantees but scale poorly to large problems, while variational approaches scale well but offer few theoretical guarantees. In particular, variational methods are known to produce overconfident estimates of posterior uncertainty and are typically non-identifiable, with many latent variable configurations generating equivalent predictions. Here, we address these challenges by showing how diffusion-based models (DBMs), which have recently produced state-of-the-art performance in generative modeling tasks, can be repurposed for performing calibrated, identifiable Bayesian inference.

calibrated bayesian inference, diffusion-based model, inflationary flow, (1 more...)

Neural Information Processing Systems

Country: North America > United States > New Mexico > Bernalillo County > Albuquerque (0.08)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)

Add feedback

Towards Safe Concept Transfer of Multi-Modal Diffusion via Causal Representation Editing

Neural Information Processing SystemsMay-26-2025, 17:16:55 GMT

Recent advancements in vision-language-to-image (VL2I) diffusion generation have made significant progress. While generating images from broad vision-language inputs holds promise, it also raises concerns about potential misuse, such as copying artistic styles without permission, which could have legal and social consequences. To address these issues, researchers have explored various approaches, such as dataset filtering, adversarial perturbations, machine unlearning, and inference-time refusals. However, these methods often lack either scalability or effectiveness. In response, we propose a new framework called causal representation editing (CRE), which extends representation editing from large language models (LLMs) to diffusion-based models. CRE enhances the efficiency and flexibility of safe content generation by intervening at diffusion timesteps causally linked to unsafe concepts.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.63)

Add feedback

A Survey on Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, and Beyond

Stoian, Mihaela Cătălina, Giunchiglia, Eleonora, Lukasiewicz, Thomas

arXiv.org Artificial IntelligenceMar-7-2025

Generative modelling has become the standard approach for synthesising tabular data. However, different use cases demand synthetic data to comply with different requirements to be useful in practice. In this survey, we review deep generative modelling approaches for tabular data from the perspective of four types of requirements: utility of the synthetic data, alignment of the synthetic data with domain-specific knowledge, statistical fidelity of the synthetic data distribution compared to the real data distribution, and privacy-preserving capabilities. We group the approaches along two levels of granularity: (i) based on the primary type of requirements they address and (ii) according to the underlying model they utilise. Additionally, we summarise the appropriate evaluation methods for each requirement and the specific characteristics of each model type. Finally, we discuss future directions for the field, along with opportunities to improve the current evaluation methods. Overall, this survey can be seen as a user guide to tabular data generation: helping readers navigate available models and evaluation methods to find those best suited to their needs.

proc, requirement, tabular data, (16 more...)

arXiv.org Artificial Intelligence

2503.05954

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Austria > Vienna (0.04)

Genre: Overview (1.00)

Industry:

Information Technology > Security & Privacy (0.46)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Liu, Kaiyuan, Mei, Jiahao, Zhang, Hengyu, Zhang, Yihuai, Wu, Xingjiao, Dong, Daoguo, He, Liang

arXiv.org Artificial IntelligenceOct-10-2024

Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we propose a new Chinese calligraphy generation model 'Moyun' , which replaces the Unet in the Diffusion model with Vision Mamba and introduces the TripleLabel control mechanism to achieve controllable calligraphy generation. The model was tested on our large-scale dataset 'Mobao' of over 1.9 million images, and the results demonstrate that 'Moyun' can effectively control the generation process and produce calligraphy in the specified style. Even for calligraphy the calligrapher has not written, 'Moyun' can generate calligraphy that matches the style of the calligrapher.

calligrapher, calligraphy, dataset, (15 more...)

arXiv.org Artificial Intelligence

2410.07618

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Shanghai > Shanghai (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

Wang, Yuchi, Ren, Shuhuai, Gao, Rundong, Yao, Linli, Guo, Qingyan, An, Kaikai, Bai, Jianhong, Sun, Xu

arXiv.org Artificial IntelligenceApr-16-2024

Diffusion models have exhibited remarkable capabilities in text-to-image generation. However, their performance in image-to-text generation, specifically image captioning, has lagged behind Auto-Regressive (AR) models, casting doubt on their applicability for such tasks. In this work, we revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding. With these benefits, diffusion models can alleviate the inherent limitations of AR methods, including their slow inference speed, error propagation, and unidirectional constraints. Furthermore, we identify the prior underperformance of diffusion models stemming from the absence of an effective latent space for image-text alignment, and the discrepancy between continuous diffusion processes and discrete textual data. In response, we introduce a novel architecture, LaDiC, which utilizes a split BERT to create a dedicated latent space for captions and integrates a regularization module to manage varying text lengths. Our framework also includes a diffuser for semantic image-to-text conversion and a Back&Refine technique to enhance token interactivity during inference. LaDiC achieves state-of-the-art performance for diffusion-based methods on the MS COCO dataset with 38.2 BLEU@4 and 126.2 CIDEr, demonstrating exceptional performance without pre-training or ancillary modules. This indicates strong competitiveness with AR models, revealing the previously untapped potential of diffusion models in image-to-text generation.

arxiv, caption, diffusion model, (14 more...)

arXiv.org Artificial Intelligence

2404.10763

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Baseball (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Bi-LORA: A Vision-Language Approach for Synthetic Image Detection

Keita, Mamadou, Hamidouche, Wassim, Eutamene, Hessen Bougueffa, Hadid, Abdenour, Taleb-Ahmed, Abdelmalik

arXiv.org Artificial IntelligenceApr-7-2024

Advancements in deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs), have ushered in an era of generating highly realistic images. While this technological progress has captured significant interest, it has also raised concerns about the potential difficulty in distinguishing real images from their synthetic counterparts. This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs). We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images. The pivotal conceptual shift in our methodology revolves around reframing binary classification as an image captioning task, leveraging the distinctive capabilities of cutting-edge VLM, notably bootstrapping language image pre-training (BLIP2). Rigorous and comprehensive experiments are conducted to validate the effectiveness of our proposed approach, particularly in detecting unseen diffusion-generated images from unknown diffusion-based generative models during training, showcasing robustness to noise, and demonstrating generalization capabilities to GANs. The obtained results showcase an impressive average accuracy of 93.41% in synthetic image detection on unseen generation models. The code and models associated with this research can be publicly accessed at https://github.com/Mamadou-Keita/VLM-DETECT.

arxiv preprint arxiv, detection, image detection, (14 more...)

arXiv.org Artificial Intelligence

2404.01959

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > France > Hauts-de-France (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report > Promising Solution (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback